A Statistical Approach for Efficient Crawling of Rich Internet Applications
نویسندگان
چکیده
Modern web technologies, like AJAX result in more responsive and usable web applications, sometimes called Rich Internet Applications (RIAs). Traditional crawling techniques are not sufficient for crawling RIAs. We present a new strategy for crawling RIAs. This new strategy is designed based on the concept of “Model-Based Crawling” introduced in [3] and uses statistics accumulated during the crawl to select what to explore next with a high probability of uncovering some new information. The performance of our strategy is compared with our previous strategy, as well as the classical Breadth-First and Depth-First on two real RIAs and two test RIAs. The results show this new strategy is significantly better than the Breadth-First and the Depth-First strategies (which are widely used to crawl RIAs), and outperforms our previous strategy while being much simpler to implement.
منابع مشابه
A Statistical Approach for Efficient Crawling of Rich Internet Applications1
Modern web technologies, like AJAX result in more responsive and usable web applications, sometimes called Rich Internet Applications (RIAs). Traditional crawling techniques are not sufficient for crawling RIAs. We present a new strategy for crawling RIAs. This new strategy is designed based on the concept of “Model-Based Crawling” introduced in [3] and uses statistics accumulated during the cr...
متن کاملA Strategy for Efficient Crawling of Rich Internet Applications
This thesis studies the problem of crawling rich internet applications. These applications are built using advanced web technologies which allow them to be more dynamic and enable better user experiences. In recent years, the popularity and importance of web applications has continually increased and they are now very commonly used to complete essential tasks such as financial transactions. As ...
متن کاملIndexing Rich Internet Applications Using Components-Based Crawling
Automatic crawling of Rich Internet Applications (RIAs) is a challenge because client-side code modifies the client dynamically, fetching server-side data asynchronously. Most existing solutions model RIAs as state machines with DOMs as states and JavaScript events execution as transitions. This approach fails when used with “real-life”, complex RIAs, because the size of the produced model is m...
متن کاملBuilding Rich Internet Applications Models: Example of a Better Strategy
Crawling “classical” web applications is a problem that has been addressed more than a decode ago. Efficient crawling of web applications that use advanced technologies such as AJAX (called Rich Internet Applications, RIAs) is still an open problem. Crawling is important not only for indexing content, but also for building models of the applications, which is necessary for automated testing, au...
متن کاملGDist-RIA Crawler: A Greedy Distributed Crawler for Rich Internet Applications
Crawling web applications is important for indexing, accessibility and security assessment. Crawling traditional web applications is an old problem, for which good and efficient solution are known. Crawling Rich Internet Applications (RIA) quickly and efficiently, however, is an open problem. Technologies such as AJAX and partial Document Object Model (DOM) updates only make the problem of craw...
متن کامل